Multi-task learning for text-dependent speaker verification
نویسندگان
چکیده
Text-dependent speaker verification uses short utterances and verifies both speaker identity and text contents. Due to this nature, traditional state-of-the-art speaker verification approaches, such as i-vector, may not work well. Recently, there has been interest of applying deep learning to speaker verification, however in previous works, standalone deep learning systems have not achieved state-of-the-art performance and they have to be used in system combination or as tandem features to obtain gains. In this paper, a novel multi-task deep learning framework is proposed for text-dependent speaker verification. First, multi-task deep learning is employed to learn both speaker identity and text information. With the learned network, utterance level average of the outputs of the last hidden layer, referred to as j-vector, means joint-vector, is extracted. Discriminant function, with classes defined as multi-task labels on both speaker and text, is then applied to the j-vectors as the decision function for the closed-set recognition, and Probabilistic Linear Discriminant Analysis (PLDA), with classes defined as on the multi-task labels, is applied to the j-vectors for the verification. Experiments on the RSR2015 corpus showed that the j-vector approach leads to good result on the evaluation data. The proposed multi-task deep learning system achieved 0.54% EER, 0.14% EER for the closed-set condition.
منابع مشابه
Deep feature for text-dependent speaker verification
Recently deep learning has been successfully used in speech recognition, however it has not been carefully explored and widely accepted for speaker verification. To incorporate deep learning into speaker verification, this paper proposes novel approaches of extracting and using features from deep learning models for text-dependent speaker verification. In contrast to the traditional short-term ...
متن کاملOn Residual CNN in Text-Dependent Speaker Verification Task
Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach g...
متن کاملTandem deep features for text-dependent speaker verification
Although deep learning has been successfully used in acoustic modeling of speech recognition, it has not been thoroughly investigated and widely accepted for speaker verification. This paper describes an investigation of using various types of deep features in a Tandem fashion for text-dependent speaker verification. Three types of networks are used to extract deep features: restricted Boltzman...
متن کاملOn leveraging conversational data for building a text dependent speaker verification system
Recently we have investigated the use of state-of-the-art textindependent and text-dependent speaker verification algorithms for a text-dependent user authentication task and obtained satisfactory results mainly by using a fair amount of text-dependent development data. In our study, best results were obtained using the NAP framework rather than using the more advanced JFA and i-vector-based fr...
متن کاملImproving Robustness of Speaker Verification by Fusion of Prompted Text-Dependent and Text-Independent Operation Modalities
In this paper we present a fusion methodology for combining prompted text-dependent and text-independent speaker verification operation modalities. The fusion is performed in score level extracted from GMM-UBM single mode speaker verification engines using several machine learning algorithms for classification. In order to improve the performance we apply clustering of the score-based data befo...
متن کامل